Q.  J.  R.  Meteorol.  Soc.  (2004),  DRAFT,  pp.  1-999 


doi:  10.1256/qj.03.n 


Semi-Implicit  Time-Integrators  for  a  Scalable  Spectral  Element 

Atmospheric  Model 

By  FRANCIS  X.  GIRALDO* 

Naval  Research  Laboratory,  Monterey,  CA  93943-5502  USA 
(Received  10  December  2003;  revised  22  November  2004;  accepted  1  April  2005) 

Summary 

The  Naval  Research  Laboratory’s  spectral  element  atmospheric  model  (NSEAM)  for  scalable 
computer  architectures  is  presented.  This  new  dynamical  core  is  based  on  a  high  order  spectral  element 
(SE)  method  in  space  and  uses  semi-implicit  methods  in  time  based  on  either  the  traditional  second 
order  leapfrog  (LF2)  or  second  order  backward  difference  formulas  (BDF2).  The  novelties  of  NSEAM 
are:  it  is  geometrically  flexible  and  thereby  can  accommodate  any  type  of  grid;  LF2  or  BDF2  are  used 
to  construct  the  semi-implicit  method;  and  the  horizontal  operators  are  written,  discretized,  and  solved 
in  three-dimensional  Cartesian  space;  .  The  semi-implicit  NSEAM  is  validated  using:  five  baroclinic 
test  cases;  direct  comparisons  to  the  explicit  version  of  NSEAM  which  has  been  extensively  tested  and 
the  results  previously  reported  in  the  literature;  and  comparisons  with  operational  weather  prediction 
and  well-established  climate  models.  A  comparison  with  the  U.  S.  Navy’s  spectral  transform  global 
forecast  model  illustrates  that  NSEAM  is  60%  faster  on  an  IBM  SP4  using  96  processors  for  the  current 
operational  resolution  of  T239  L30.  However,  NSEAM  can  accommodate  many  more  processors  while 
continuing  to  scale  efficiently  even  at  higher  grid  resolutions.  In  fact,  we  show  that  at  T498  L60,  NSEAM 
scales  linearly  up  to  384  processors. 

Keywords:  Backward  Difference  Formula  Grid  Hexahedral  Hydrostatic  Icosahedral  Leapfrog 
Primitive  Equations 


1.  Introduction 

The  current  trend  in  high  performance  computing  has  shifted  to  the  develop¬ 
ment  of  systems  having  tens  of  thousands  of  processors;  the  two  fastest  computers 
in  the  world  reported  at  Supercomputing  2004  have  32,000  (IBM,  BlueGene/L) 
and  10,000  (SGI)  processors.  In  fact  BlueGene/L  is  expected  to  reach  its  full 
capacity  of  131,000  processors  by  June  2005.  Therefore,  to  fully  exploit  this  type 
of  architecture  requires  utilizing  numerical  methods  that  rely  on  a  decomposition 
of  the  global  domain  into  a  multitude  of  smaller  subdomains.  Methods  that  rely 
on  domain  decomposition  are  known  as  local  methods  whereas  those  that  do 
not  are  referred  to  as  global  methods  because  they  require  the  information  of 
the  entire  global  domain  in  order  to  operate  on  a  specific  subdomain.  The  best 
example  of  a  global  method  is  the  spectral  transform  (ST)  method.  Examples  of 
local  methods  include  the  finite  difference  (FD),  finite  element  (FE),  and  finite 
volume  (FV)  methods.  However,  the  biggest  disadvantage  of  local  methods  is 
that  they  have  not  been  able  to  compete,  in  terms  of  accuracy,  with  ST  methods 
which  have  been  used  traditionally  in  operational  numerical  weather  prediction 
(NWP)  and  climate  models. 

Spectral  element  (SE)  methods  combine  the  local  domain  decomposition 
property  of  FE  methods  with  the  high-order  accuracy  and  weak  numerical 
dispersion  of  ST  methods.  SE  methods  have  shown  promise  in  many  areas  of  the 
geosciences  including:  seismic  wave  propagation  (Komatitsch  and  Tromp  1999), 
deep  Earth  flows  (Fournier  et  al.  2004) ,  climate  (Thomas  et  al.  2002  and  Fournier 
et  al.  2004),  ocean  (Iskandarani  et  al.  2002),  and  NWP  (Giraldo  and  Rosmond 
2004)  modeling.  These  methods  are  high-order  FE  methods  where  the  grid  points 
are  chosen  to  be  the  Legendre-Gauss-Lobatto  points.  In  Giraldo  and  Rosmond 
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(2004)  we  introduced  a  SE  atmospheric  model  with  an  explicit  leapfrog  time- 
integrator  that  was  shown  to  scale  linearly  while  achieving  accuracies  similar  to 
those  obtained  with  ST  models.  However,  in  that  paper  it  was  shown  that  the 
explicit  SE  model  would  only  outperform  ST  models  at  resolutions  beyond  T406. 
In  order  to  surpass  ST  models  at  resolutions  below  T406  requires  upgrading  the 
time-integrator  from  explicit  to  semi-implicit.  Explicit  time-integrators  are  too 
inefficient  for  NWP  applications  because  the  fast  moving  gravity  waves  require 
the  use  of  small  time  steps  to  maintain  stability.  In  order  to  ameliorate  this  rather 
stringent  time  step  restriction,  researchers  have  discretized  the  gravity  wave  terms 
implicitly  in  time  and  the  Rossby  wave  terms  explicitly;  this  is  the  idea  behind 
the  semi-implicit  method  (Kwizak  and  Robert  1971). 

In  this  paper  we  describe  a  new  semi-implicit  Eulerian  atmospheric  model 
based  on:  the  SE  method  in  space  where  the  horizontal  operators  are  written, 
discretized,  and  solved  in  three-dimensional  (3D)  Cartesian  space;  and  a  dis¬ 
cretization  in  time  by  the  second  order  leapfrog  (LF2)  and  backward  difference 
formulas  (BDF2).  Eulerian  atmospheric  models  typically  use  LF2  for  their  semi- 
implicit  method;  examples  include  the  U.  S.  Navy  (Hogan  and  Rosmond  1991) 
and  the  National  Center  for  Environmental  Prediction  (Tremolet  and  Sela  1999). 
The  reasons  for  experimenting  with  BDF2  are: 

i.  BDF2  are  absolutely  stable  in  the  region  of  interest  for  these  equations  (for 
real  and  distinct  eigenvalues); 

ii.  the  resulting  computational  modes  are  damped  and,  therefore,  no  time-filter 
is  required  (time-filters  diminish  the  order  of  accuracy) ; 

iii.  the  resulting  pseudo-Helmholtz  matrix  has  a  smaller  condition  number  than 
the  one  obtained  with  the  LF2  method  which  translates  into  fewer  iterative 
solves  per  time  step  and  results  in  a  more  efficient  model. 

However,  BDF2  is  by  no  means  ideal  for  this  class  of  equations  (Hamiltonians) 
and  their  weaknesses  will  be  discussed  in  Sec.  2. 

The  advantages  of  using  Cartesian  coordinates  are:  the  pole  singularity  which 
plagues  the  equations  in  spherical  coordinates  disappears  and  the  numerical 
model  is  completely  independent  from  the  grid.  Because  the  numerical  method 
is  constructed  independently  from  the  grid,  this  then  permits  any  grid  to  be 
used  including:  icosahedral,  hexahedral,  telescoping,  and  adaptive  unstructured 
grids.  This  independence  from  the  grid  is  not  shared  by  any  of  the  existing  and 
newly  proposed  global  atmospheric  models  including  the  FD  model  in  Davies  et 
al.  (2005),  the  FE  model  in  Cote  et  al.  (1998),  the  FV  model  in  Lin  and  Rood 
(1996),  the  icosahedral  models  in  Randall  et  al.  (2002)  and  Majewski  et  al.  (2002), 
the  SE  model  in  Thomas  et  al.  (2002),  and  the  ST  models  in  Hack  et  al.  (1992), 
Hogan  and  Rosmond  (1991),  Temperton  et  al.  (2001),  and  Tremolet  and  Sela 
(1999).  In  fact,  the  formulations  of  all  these  models  are  restricted  to  a  specific 
grid  geometry. 

Element-based  Galerkin  (EBG)  methods,  such  as  the  SE  method,  offer  many 
more  benefits  in  addition  to  permitting  the  use  of  any  grid.  For  example,  to  switch 
from  quadrilateral  to  triangular  elements  merely  requires  changing  the  basis 
functions,  and  the  associated  quadrature  points  and  weights  as  is  done  in  Giraldo 
and  Warburton  (2005)  which  are  being  considered  for  future  implementation  into 
NSEAM.  Changing  from  globally  conservative  to  locally  conservative  methods 
only  requires  changing  the  element  boundary  conditions  to  account  for  fluxes  as 
is  done  in  Giraldo  et  al.  (2002);  this  then  simplifies  the  construction  of  adaptive 
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solutions  in  addition  to  giving  a  fully  conservative  method  (the  development  of 
non-hydrostatic  ocean  and  atmospheric  models  using  this  approach  is  currently 
underway).  In  short,  whenever  a  new  contribution  from  approximation  theory 
emerges,  the  new  basis  functions  (and  associated  quadrature)  can  be  easily 
implemented  into  the  existing  EBG  model;  an  example  is  the  current  work  on  non¬ 
polynomial  expansions  derived  from  the  prolate  spheroidal  wave  functions  (Boyd 
2004)  which  we  are  currently  testing.  This  flexibility  in  EBG  methods  allows  an 
existing  model  to  adapt  to  the  changing  needs  in  science  and  computing  which 
justifies  the  further  development  of  SE  models  and  should  ensure  their  longevity. 

The  objective  of  the  present  work  is  to  introduce  a  new  grid  point  semi- 
implicit  atmospheric  model  which: 

i.  is  spectrally  accurate; 

ii.  is  highly  scalable  on  distributed-memory  computers; 

iii.  allows  for  the  use  of  any  type  of  grid; 

iv.  facilitates  its  continuing  augmentation  in  accuracy  and  efficiency  due  to  its 
element-based  construction. 

The  present  work  essentially  extends  the  explicit  time-integrator  of  NSEAM  to 
semi-implicit.  For  convergence  rates  of  the  discrete  horizontal  operators  the  reader 
is  referred  to  the  article  by  Giraldo  and  Rosmond  (2004). 

The  remainder  of  the  paper  is  organized  as  follows.  Section  2  describes 
the  construction  of  the  semi-implicit  time-integrators.  Section  3  contains:  a 
description  of  the  implementation  of  the  model  on  distributed- memory  computers 
using  the  Message-Passing  Interface  standard;  a  scalability  comparison  between 
NSEAM  and  the  U.  S.  Navy’s  Operational  Global  Atmospheric  Prediction  System 
(NOGAPS);  and  a  performance  comparison  between  the  LF2  and  BDF2  semi- 
implicit  time-integrators  of  NSEAM.  In  Sec.  4  the  results  for  the  five  test  cases 
used  to  validate  the  model  are  presented.  Finally,  in  Sec.  5  we  summarize 
the  key  findings  of  this  research.  For  completeness,  Appendix  A  contains  a 
description  of  the  semi-implicit  method  applied  to  the  hydrostatic  primitive 
equations  discretized  in  time  by  a  general  second  order  time-integrator  and  in 
space  by  the  SE  method  in  a  Cartesian  coordinate  system. 


2.  Semi-Implicit  Time- Integrators 


The  governing  equations  solved  in  the  present  work  are  the  hydrostatic 
primitive  equations  (HPE).  We  assume  an  adiabatic  atmosphere  (i.e.,  no  diabatic 
forcing)  and  thus  only  take  into  account  the  dynamical  processes.  Let  the  HPE 
be  written  in  the  following  vector  form 

where  q  =  (-7T,  u!  .  6 )  is  the  state  vector  containing  the  prognostic  variables, 
u  =  (w,  v,  w)T ,  T  is  the  transpose  operator,  and 

(  V  •  (t ru)  +  ^  (mr)  \ 


S{q)  =  - 


a  •  v  u  i |-  ■ 


da 


(2) 
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V 


u.ve  +  ofa 
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is  the  source  vector  function.  For  closure  we  require  the  hydrostatic  equation 


d<j) 

dP 


—  Cp8. 


(3) 


For  completeness,  we  define  the  terms  contained  in  these  equations.  The  terms 
(a,  fl)  are  the  radius  and  angular  rotation  of  the  earth,  respectively.  The  prog¬ 
nostic  variables  are:  ft  =  ps  ~  Pt  where  ps  is  the  surface  pressure  and  pt  is  the 
pressure  at  the  top  of  the  model;  the  wind  velocities,  u  =  ( u ,  v.  w)T ;  and  the 
potential  temperature,  0.  The  diagnostic  variables  are:  the  vertical  velocity,  er; 
the  geopotential  height,  4 !>;  and  the  pressure,  p.  Other  variables  requiring  defini¬ 
tion  are:  the  Cartesian  coordinate  of  the  grid  points,  x:  the  vertical  coordinate, 
0  <  <7  <  1,  defined  from  the  top  of  the  atmosphere  to  the  surface  of  the  planet;  the 
Exner  function,  P;  and  the  coefficient  of  specific  heat  for  constant  air  pressure, 
cp  .  Finally,  the  term  //  is  a  Lagrange  multiplier  required  only  because  we  use 
a  3D  momentum  equation  in  Cartesian  coordinates  to  represent  the  correspond¬ 
ing  2D  momentum  equation  in  spherical  coordinates  (see  Cote  1988).  With  the 
equations  defined  we  can  now  proceed  to  the  description  of  the  semi-implicit 
time-integrators. 


(a)  General  Form  of  Second  Order  Semi- Implicit  Time- Integrators 

Before  describing  the  implementation  of  the  semi-implicit  (SI)  method  it  is 
crucial  to  understand  which  terms  must  be  discretized  implicitly.  The  maximum 
characteristic  wave  speed  of  the  HPE,  Eqs.  (1)  and  (2),  is  given  by  U  +  \ff> 
where  U  =  u  ■  n  is  the  wind  speed  along  the  direction  n  and  is  the  speed  of 
the  gravity  waves.  The  fastest  gravity  waves  may  travel  up  to  six  times  faster  than 
the  fastest  wind  velocities.  In  the  semi-implicit  (SI)  method  the  terms  responsible 
for  the  propagation  of  the  gravity  waves  are  treated  implicitly  and  the  remaining 
terms  explicitly.  This  essentially  slows  down  the  gravity  waves  which  does  not 
adversely  affect  the  medium-range  forecast  skill  because  they  only  carry  a  small 
amount  of  energy.  It  should  be  mentioned  that  there  exist  alternatives  to  the 
SI  method  with  the  leading  contender  perhaps  being  the  Jacobian-free  Newton- 
Krylov  method  (see  Knoll  and  Keyes  2004,  for  example)  in  which  the  entire  set 
of  equations  are  solved  fully  implicitly  in  time.  However,  in  the  present  work  we 
shall  only  consider  the  SI  approach. 

In  Eq.  (2)  the  source  terms  contributing  to  the  propagation  of  gravity  waves 
(G)  are 


S°(q) 


(  V  •  (ftu)  +  £  (ttct)  > 

+  7T 


(4) 


V 


J 


We  then  seek  a  solution  to  the  equations  recast  in  the  following  form 

^  =  {S(q)  ~  $LG{q)}  +  S  [LG{q)]  (5) 

where  the  terms  inside  the  curly  brackets  are  time-integrated  explicitly,  those 
inside  the  square  brackets  implicitly,  LG  (defined  in  Eq.  (A. 2))  represents  the 
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linearization  of  SG ,  and  6  =  0  or  1  depending  on  whether  the  method  is  purely 
explicit  or  semi-implicit. 

To  simplify  the  following  discussion,  let  us  write  the  integration  of  Eq.  (5)  in 
the  following  general  form 

12  2 

qn+1  =  amqn~m  +  7  At  Y,  PrnS(q)n-m  +  h At  Y  PmL(q)n-m  (6) 
m= 0  m= 0  m=— 1 

where  Table  1  lists  the  associated  coefficients  corresponding  to  the  BDF2  and 
LF2  methods  and  '0  is  the  explicit /implicit  weighting  with  ■()  =  0.5  yielding  the 
trapezoidal  rule  (also  known  as  Crank-Nicholson).  The  BDF2A  method  shown  in 
Table  1  is  the  method  proposed  by  Karniadakis  et  al.  (1991)  for  the  incompressible 
Navier-Stokes  equations  and  used  by  Shen  and  Wang  (1999)  for  the  HPE  while 
the  BDF2B  method  was  proposed  by  Hulsten  (1996)  but  has  not  been  used  or 
further  studied.  Note  that  unlike  the  BDF2  methods,  the  LF2  method  requires 
the  application  of  the  following  time  filter  (Asselin  1972) 

qn  =  qn  +  e  ( qn+1  -  2 qn  +  qn~l)  (7) 

where  q  denotes  the  time-filtered  variable  with  the  time-filter  weight  e. 


Method 

cto 

Oil 

7 

/3o 

Pi 

P-2 

P- 1 

PO 

Pi 

P2 

BDF2A 

4/3 

WI73- 

2/3 

2 

-1 

0 

1 

-2 

1 

0 

BDF2B 

4/3 

-1/3 

2/3 

8/3 

-7/3 

2/3 

1 

-8/3 

7/3 

-2/3 

LF2 

0 

1 

2 

1 

0 

0 

■d 

-1 

1 

0 

TABLE  1.  Backward  difference  formulas  (BDF2)  and  leapfrog  (LF2)  time-integrators  and  their 
associated  coefficients  corresponding  to  Eq.  (6). 


By  exploiting  the  linearity  of  the  L  operator  we  can  rewrite  Eq.  (6)  as 

PmS(q)n-m  +  6'yAtL 

m= 0  m= 0 

Furthermore,  we  can  simplify  this  equation  by  extracting  the  fully  explicit 
solution  from  Eq.  (8)  as  follows 

1  2 

^explicit  =  amqn~m  +  7 A t  £  f3mS(q)n-m.  (9) 

m= 0  m= 0 


1  2 
qn+1=Yarnqn-m+lAtY 


Multiplying  Eq.  8  by  p_i,  and  adding  Y?m= o  ™  yields 


qtt  =  Q  +  h^tp-xL  (. qtt ) 


where 


q  =  p-iq 


explicit 


icit  _|_  ^  ^ 


Pmq 


m= 0 


and 


E 

m=  —  1 


„  _  x  _  nn~m  —  r.  _L_  \  '  „  rJl-m 

qtt  —  7 ^  Pmq  —  p-iq  +  /  y  Pmq 


m= 0 


(10) 

(li) 


(12) 
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Equation  (10)  is  the  form  that  we  use  for  the  construction  of  the  semi-implicit 
method  along  with  the  definitions  in  Eqs.  (9),  (11),  and  (12).  The  subscript  tt  in 
the  semi-implicit  state  vector  q  is  meant  to  emphasize  the  similarity  between 
the  semi-implicit  correction  and  a  temporal  second  order  derivative.  This  is 
quite  evident  for  BDF2A  where  qtt  =  qn+l  —  2qn  +  qn_1.  Note  that  the  form 
given  in  Eq.  (10)  is  only  possible  by  using  the  linearization  in  Eq.  (5)  which 
allows  the  semi-implicit  method  to  be  written  as  a  correction  to  the  explicit 
method.  This  can  be  seen  by  taking  5  =  0  and  equating  Eqs.  (12)  and  (11)  which 
results  in  qn+1  =  r/xplicit  .  It  should  be  mentioned  that  constructing  the  semi- 
implicit  method  as  a  correction  to  an  explicit  method  as  shown  in  Eq.  (10) 
has  been  adopted  universally  by  many,  if  not  most,  of  the  operational  NWP 
centers  including:  the  European  Center  for  Medium-Range  Weather  Forecasting 
(see  Ritchie  et  al.  1995),  the  National  Center  for  Environmental  Prediction,  and 
the  U.  S.  Navy. 


Figure  1.  The  stability  of  the  explicit  versions  of  a)  BDF2A,  b)  BDF2B,  c)  LF2  with  e  =  0,  and  d)  LF2 

with  e  =  0.05. 


In  the  following  discussion  we  use  the  equation 


dq 

dt 


=  ikq 


to  evaluate  the  stability  properties  of  BDF2  and  LF2.  In  Fig.  1  we  show  the 
stability  region  for  the  explicit  formulations  of  BDF2  and  LF2.  Because  BDF2 
and  LF2  both  yield  multiple  numerical  solutions  to  a  first  order  equation  this 
then  means  that  only  one  solution  is  physical  while  the  others  are  purely 
computational.  The  solid  lines  represent  the  physical  solutions  and  the  dashed 
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lines  are  the  computational  modes.  BDF2A  clearly  is  inferior  to  the  other  two 
methods  because  it  becomes  unstable  quite  early  near  &Af  =  0.15  (Fig.  la); 
however,  the  computational  solution  remains  damped  and  thereby  does  not 
require  the  use  of  a  filter.  The  physical  solution  of  the  BDF2B  method  is 
completely  stable  but  becomes  quite  damped  for  increasing  time  step;  however, 
one  of  the  computational  solutions  eventually  becomes  unstable  near  kAt  =  0.73 
(Fig.  lb).  In  contrast,  since  the  original  LF2  scheme  is  symplectic  (i.e.,  it  exactly 
conserves  all  Lagrangian  integral  invariants),  it  is  completely  undamped  (Fig. 
lc).  This  is  true  for  both  the  physical  and  computational  modes.  For  the  physical 
mode  this  is  a  highly  desirable  property  but  not  for  the  computational  mode 
because  it  can  become  excited  through  nonlinear  interactions  with  the  physical 
mode  and  eventually  become  unstable  (see  Sanz-Serna  1985  and  Aoyagi  1995).  For 
this  reason,  in  the  geophysical  fluid  dynamics  community  LF2  is  typically  used 
in  conjunction  with  a  time  filter  (Asselin  1972)  which  damps  the  computational 
mode  while  selectively  modifying  the  physical  mode  (Fig.  Id).  This  is  by  no 
means  the  only  choice  available  for  eradicating  the  computational  mode.  In  the 
astrophysics  community,  which  places  much  importance  on  symplecticness  (which 
is  ideal  for  Hamiltonian  systems  such  as  the  first  order  HPE)  the  approach  has 
been  to  introduce  second  order  Runge-Kutta  smoothers  (see  Aoyagi  1995  and 
New  et  al.  1998)  which  obviate  the  need  for  time- filters;  however,  in  the  present 
work  we  only  consider  LF2  with  the  Asselin  time-filter. 

Based  on  the  above  discussion  it  is  safe  to  conclude  that  BDF2B  and  LF2 
with  e  =  0.05  are  the  most  stable  schemes  shown  in  Fig.  1.  Let  us  now  compare 
these  two  methods.  Clearly,  the  ideal  method  would  be  one  whereby  the  physical 
solution  is  undamped  while  the  computational  solution  is  damped.  BDF2B  and 
LF2  with  e  =  0.05  damp  both  the  physical  and  computational  solutions.  The 
question  we  now  try  to  answer  is  how  much  unwanted  dissipation  do  these  two 
methods  introduce?  Assuming  that  1%  numerical  dissipation  is  an  acceptable 
level  (i.e.,  the  amplification  factor  approaches  0.990)  we  find  that  BDF2B  reaches 
this  value  at  kAt  =  0.38  while  LF2  with  e  =  0.05  at  kAt  =  0.58.  Thus  BDF2B  is 
much  more  dissipative  than  LF2  with  e  =  0.05.  However,  for  e  =  0.1  LF2  reaches 
this  level  at  kAt  =  0.41  which  is  as  dissipative  as  BDF2B.  In  the  present  work  we 
use  e  =  0.05  for  LF2  for  most  of  the  simulations  but  for  very  long  time-integrations 
(such  as  the  Held-Suarez  test  case)  we  use  e  =  0.1. 

In  Fig.  2  we  show  the  stability  region  for  the  implicit  formulations  of  BDF2 
and  LF2.  For  the  fully  implicit  formulation  (5  =  1)  BDF2A  and  BDF2B  collapse 
to  the  classical  BDF2  method  found  in  numerical  analysis  text  books  (e.g.,  Gear 
1971  page  217).  The  reason  there  is  no  computational  mode  for  BDF2  in  Fig.  2a 
is  because  the  amplification  factor  of  this  mode  remains  below  0.32  and  hence 
is  not  visible  in  this  amplified  plot.  Clearly  BDF2  and  LF2  are  unconditionally 
stable.  However,  this  stability  is  achieved  at  the  price  of  damping  the  physical 
solution.  Once  again,  if  we  take  1%  dissipation  as  an  acceptable  level  we  find  that 
BDF2  reaches  it  at  kAt  =  0.51  (Fig.  2a)  while  LF2  with  e  =  0.05  and  ■&  =  0.5  at 
kAt  =  0.73  (Fig.  2b).  For  e  =  0.1  and  ■&  =  0.5  LF2  reaches  this  level  at  kAt  =  0.46 
(Fig.  2c)  which  is  now  below  BDF2.  Changing  the  implicit  weight  of  LF2  has  even 
more  drastic  consequences.  For  example,  for  e  =  0.05  and  r)  =  0.6  LF2  reaches  1% 
dissipation  at  kAt  =  0.22  (Fig.  2d).  In  fact  for  this  choice  of  e  and  -d  LF2  is  more 
dissipative  than  BDF2  for  the  entire  range  of  kAt  values.  Values  of  t)  >  0.5  are 
not  uncommon  in  NWP  models  and  in  fact  are  often  used  to  off-center  semi- 
Lagrangian  schemes  in  order  to  eliminate  unwanted  noise  near  steep  topography 
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Figure  2.  The  stability  of  the  implicit  versions  of  a)  BDF2,  b)  LF2  with  e  =  0.05  and  i?  =  0.5,  c)  LF2 
with  e  =  0.1  and  $  =  0.5,  and  d)  LF2  with  e  =  0.05  and  $  =  0.6. 


(see  Rivest  et  al.  1994,  Cote  et  al.  1995,  and  Davies  et  al.  2005).  BDF2  has  the 
advantage  of  being  naturally  off-centered  and  therefore  able  to  avoid  the  spurious 
resonant  response  induced  by  orographic  forcing. 

In  summary,  LF2  with  e  =  0  and  •d  —  0.5  is  a  better  method  than  BDF2 
because  it  is  non-dissipative;  however,  as  shown  by  Sanz-Serna  (1985)  LF2 
can  become  unstable  due  to  the  nonlinear  interaction  between  the  physical 
and  computational  modes  even  when  linear  stability  analysis  shows  otherwise. 
By  choosing  e  >  0  and/or  d  >  0.5  LF2  becomes  more  stable  but  at  the  price 
of  becoming  dissipative  and  losing  its  second  order  accuracy.  Thus  if  one  is 
willing  to  accept  a  small  amount  of  dissipation,  have  off-centering  built  into 
the  time-integrator,  but  would  like  to  retain  second  order  accuracy  then  BDF2 
is  a  reasonable  choice.  The  stability  analysis  also  shows  that  due  to  its  larger 
explicit  stability  region  LF2  admits  a  larger  time  step  than  BDF2,  at  least 
for  the  Rossby  waves.  However,  as  we  shall  see  in  Sec.  3  (b)  ,  a  larger  time 
step  does  not  necessarily  translate  into  a  faster  model  -  at  least  not  for  pure 
dynamics  (i.e.,  no  diabatic  forcing).  The  success  of  BDF2  for  the  shallow  water 
equations  using  a  spectral  element  semi-Lagrangian  method  (Giraldo  et  al. 
2003),  the  incompressible  Navier-Stokes  equations  using  Eulerian  time-integrators 
(Karniadakis  et  al.  1991),  semi-Lagrangian  methods  (Xiu  and  Karniadakis  2001), 
and  operator-integration-factor  splitting  methods  (Maday  et  al.  1990)  has  been 
the  main  motivation  for  considering  BDF2  for  the  semi-implicit  version  of 
NSEAM.  We  now  turn  to  the  construction  of  the  Helmholtz  operator  resulting 
from  the  generalized  semi-implicit  time-integration. 
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( b )  The  3D  Pseudo- Helmholtz  Operator 

After  application  of  the  semi-implicit  method,  as  outlined  in  Appendix  A 
for  the  generalized  2nd  order  form  given  in  Eq.  (8),  we  obtain  the  following  3D 
pseudo-Helmholtz  equation 

$ >l  ~  A 2Vlm  (M-1DTVM~1D$)m  =  8,  -  \Vhm  {. M~1DTVu)m  (13) 

for  the  variable  which  is  a  linear  combination  of  the  potential  temperature  and 
surface  pressure  semi-implicit  corrections  (see  Eq.  (A.  12)).  To  properly  explain 
the  solution  strategy  of  this  3D  pseudo-Helmholtz  problem,  a  detailed  description 
of  Eq.  (13)  is  first  in  order. 

In  Eq.  (13)  M  and  D  are  the  mass  and  differentiation  matrices  resulting 
from  the  spectral  element  discretization,  V  is  the  projection  matrix  which 
constrains  the  3D  Cartesian  velocities,  and  V  is  the  matrix  containing  the 
vertical  contribution  of  the  semi-implicit  method  (see  Eq.  (A.  14)).  At  this  point 
we  have  only  used  subscripts  for  the  matrices  corresponding  to  the  vertical 
discretization.  In  Eq.  (13)  V  is  an  JViev  x  N\ew  matrix  with  /,  m  =  1,  ...,  N\ew 
which  is  defined  for  every  horizontal  grid  point  i  =  1,  , ,  ,  .Np  where  N\ew  are 
the  number  of  vertical  layers  in  the  model  and  Np  are  the  total  number  of 
horizontal  grid  points.  Thus  the  matrix  V  couples  every  vertical  layer  at  each 
horizontal  grid  point.  The  mass  matrix,  M,  and  the  differentiation  matrix,  D. 
are  Np  x  Np  matrices  with  i.  j  =  1,  ...,  Np  and  thus  only  involve  the  horizontal 
direction.  Furthermore,  the  differentiation  matrix  D  is  a  vector  of  matrices  such 
that  D  =  ( Dxi  +  Dyj  +  Dzk)  where  (*,  j.  k)  are  the  Cartesian  directional  unit 
vectors,  and  Ds  denotes  the  differentiation  matrix  along  the  s  direction. 

To  further  simplify  the  discussion  let  us  define  two  additional  horizontal 
matrices:  let 

A  =  M~1DT'PM~lD  (14) 

represent  the  discrete  pseudo-Laplacian  operator  and 

Hfj  =  M~1DtV  (15) 

the  discrete  divergence  operator.  Substituting  Eqs.  (14)  and  (15)  into  Eq.  (13), 
factoring  terms,  and  including  subscripts  for  the  matrices  corresponding  to  the 
horizontal  and  vertical  discretizations  yields 

\ll,m  ®  Ii,j  —  A  (Ej,m  <8  H^ j ) ]  n  =  *&l.i  ~  A  (V^m  <8>  uj,m  (16) 

where  <g)  denotes  the  tensor  product,  I^m  and  Il:J  are  identity  matrices  associated 
with  the  vertical  and  horizontal  discretizations,  respectively.  Let  us  now  describe 
some  important  details  of  the  matrix  problem  defined  in  Eq.  (16).  The  tensor 
product  Ii  m  0  I,  j  results  in  the  identity  matrix  which  is  an  Nfev  x  Np 

matrix;  the  same  is  true  for  the  tensor  products  of  V  with  II L  and  HIJ .  Thus  if 
we  replaced  the  left  hand  side  matrix  of  Eq.  (16),  which  is  in  the  square  brackets, 
by  II  U)  then  it  becomes  evident  that  the  product  of  Hffm  ■  with  the  solution 
matrix  &j,m  results  in 

Gl.i  = 

where  G  is  an  N\ev  x  Np  matrix  that  spans  the  entire  3D  domain  because 
l  =  1,  ...,  Aiev  spans  the  vertical  and  i  =  1,  ...,  Np  spans  the  horizontal  directions. 
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At  this  point,  the  Helmholtz  operator  of  the  semi-implicit  formulation  given 
in  Eq.  (16)  yields  a  fully  3D  matrix.  Below  we  describe  the  vertical  mode 
decomposition  which  transforms  the  3D  HPE  into  a  series  of  2D  shallow  water 
equations  which  are  then  solved  much  more  efficiently. 


(c)  Vertical  Mode  Decomposition 

Note  that  the  matrix  HL  in  Eq.  (16)  is  completely  independent  from  the 
vertical  direction  while  the  the  matrix  V  is  independent  from  the  horizontal 
direction;  however,  the  tensor  product  of  V  and  II L  results  in  a  fully  3D  system. 
Thus  to  reduce  this  intimidatingly  large  3D  matrix  problem  we  apply  a  mode 
decomposition  of  the  matrix  V  in  the  vertical  direction.  Upon  completion  of  this 
decomposition  the  full  3D  Helmholtz  matrix  problem  will  be  converted  into  a 
series  of  N\ew  2D  Helmholtz  problems. 

In  order  to  perform  this  vertical  decomposition  we  write  the  matrix  V  in  the 
canonical  form 

V  =  RAR-1 

where  A  are  the  eigenvalues  of  V  and  R  are  its  associated  right  eigenvectors  which 
satisfy 

VR  =  RA. 


Note  that  this  eigenvalue  problem  is  not  computationally  expensive  because  V  is 
an  iViev  x  iViev  matrix  which  only  needs  to  be  decomposed  once.  Upon  obtaining 
this  vertical  mode  decomposition  we  simply  left-multiply  Eq.  (16)  by  R~ 1  to  yield 


fe  -  (Rd*kj)T  =  [-R,;*1  ($t,i  -  A(Vi,„  ®  .  (17) 

Letting 

we  then  obtain  the  following  series  of  2D  Helmholtz  problems 

(lid  -  A2 A iHtj)  =  [r^  -  m,m  ®  H^)ujjm)]T 


(18) 


for  the  variable  &R.  Note  that  for  each  of  the  l  —  1,  ...,  JViev  modes  Eq.  (18) 
represents  an  individual  2D  Helmholtz  problem  of  size  Np  x  Np  which  is  in  fact 
analogous  to  a  shallow  water  model  scaled  by  the  eigenvalue  Ap 

Figure  3a  shows  that  the  eigenvalues,  A,  of  the  six  fastest  vertical  modes 
decrease  exponentially.  In  fact,  the  third  eigenvalue  (A2)  has  a  value  of  104  which 
its  square  root  is  already  in  the  range  of  the  highest  horizontal  wind  velocities 
encountered  in  the  atmosphere  (~  100  m/s).  Thus  the  semi-implicit  method  is 
only  required  for  the  vertical  modes  which  have  gravity  wave  speeds  beyond  the 
highest  horizontal  wind  velocities.  For  this  reason  we  only  solve  Eq.  (18)  for  the 
first  four  vertical  modes  in  much  the  same  way  first  proposed  by  Burridge  (1975). 
Figure  3b  shows  the  right  eigenvectors,  R.  as  a  function  of  cr  for  the  three  fastest 
vertical  modes.  Let  us  now  turn  our  attention  to  the  solution  of  the  2D  Helmholtz 
problems. 
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Figure  3.  The  a)  eigenvalues,  A,  of  the  six  fastest  vertical  modes  and  b)  the  right  eigenvectors,  R,  of 

the  three  fastest  vertical  modes. 


(d)  Solution  of  the  2D  Pseudo- Helmholtz  Operator 

From  the  vertical  mode  analysis  it  was  determined  that  only  the  Helmholtz 
operator  corresponding  to  the  first  four  vertical  modes  need  to  be  solved  implicitly 
in  time;  the  remainder  can  be  computed  explicitly.  This  coupling  between  explicit 
and  implicit  methods  can  be  seamlessly  included  into  the  numerical  approach  by 
using  the  explicit  solution,  <h,  as  the  initial  guess  for  the  implicit  solution,  <I>.  This 
is  why  it  is  so  beneficial  to  write  the  semi-implicit  method  as  a  correction  to  the 
explicit  method.  First,  the  explicit  solution  is  obtained  and  then  only  for  the  first 
four  vertical  modes  are  the  semi-implicit  corrections  included.  The  semi-implicit 
correction  is  obtained  by  solving  Eq.  (18)  using  the  generalized  minimum  residual 
method  (GMRES)  with  point  Jacobi  preconditioning,  L-2  projection  for  the  next 
iterate,  and  restarts  every  10  time-steps  (see  Fischer  1998).  There  are  numerous 
other  Krylov  subspace  methods  to  choose  from  but  we  have  decided  on  GMRES 
based  on  previous  experiences  (see  Giraldo  et  al.  2003).  In  addition,  there  are 
also  more  elaborate  preconditioners  and  we  shall  report  on  our  experiences  with 
methods  such  as  overlapping  Schwarz  (see  Pavarino  2002)  in  future  work. 

The  number  of  GMRES  iteration  required  for  convergence  are  dependent  on 
the  stopping  criterion,  est0p,  and  the  size  of  A2  A  in  Eq.  (18);  the  smaller  estop  or  the 
larger  A2A  the  more  iterations  required.  The  value  of  est0p  is  defined  by  the  user 
and  there  are  numerous  strategies  for  choosing  this  value  which  are  beyond  the 
scope  of  the  present  work.  It  turns  out  that  the  value  of  A2A  is  dependent  on  the 
eigenmode  (more  iterations  are  required  for  the  external  mode),  the  coefficients 
7  and  p-\  of  the  semi-implicit  method  (see  Eq.  (A. 6)),  and  of  course  the  time- 
step.  It  is  shown  in  Sec.  3  ( b )  that  increasing  the  time-step  by  a  given  factor 
does  not  mean  that  an  efficiency  gain  of  this  size  will  be  achieved.  Similarly,  one 
time-integrator  running  with  a  larger  time-step  may  not  be  more  efficient  than 
another  time-integrator  running  with  a  smaller  time-step.  The  overall  efficiency 
of  the  model  is  determined  by  the  scaling  A2A  which  affects  the  condition  number 
of  the  resulting  Helmholtz  problem. 

Upon  obtaining  the  solution  for  we  then  left-multiply  by  R  to  obtain  <I>. 
that  is, 

$  =  R$r. 

Once  $  is  obtained  we  then  solve  for  utt  via  Eq.  (A.  15).  With  this  value  of  utt 
known  we  can  then  solve  for  i tr  via  Eq.  (A. 7)  and  Or  via  Eq.  (A. 9).  Finally,  the 
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prognostic  variables  are  extracted  from  qtt  via  Eq.  (12).  This  then  concludes  the 
solution  strategy  for  each  time-step. 


Figure  4.  Three  of  the  many  possible  grids  that  NSEAM  can  use.  They  include:  a)  icosahedral,  b) 
hexahedral,  and  c)  telescoping  grids.  The  three  grids  have  approximately  the  same  number  of  grid  points 
as  the  spectral  grid  T239  where  the  49  high-order  grid  points  inside  each  quadrilateral  (N  =  6)  have 

been  omitted  for  clarity. 


(a)  Model  Scalability 

One  of  the  main  advantages  of  using  spectral  element  (SE)  methods  over 
spectral  transform  (ST)  methods  is  that  for  an  equivalent  grid  resolution  the  SE 
method  allows  the  use  of  far  more  processors.  Assuming  that  a  ID  decomposition 
along  latitude  rings  is  the  most  efficient  decomposition  for  ST  models  (as  in 
the  U.  S.  Navy’s  operational  NWP  model,  NOGAPS),  the  maximum  number  of 
processors  that  a  ST  model  can  use  is 

N*J0C  =  Nlat*^T  (19) 

where  N\at  denotes  the  number  of  latitude  rings  and  T  the  resolution  of  the 
spectral  triangular  truncation.  In  contrast,  on  a  hexahedral  grid  (see  Fig.  4b) 
with  Np  =  6(n^flV)2  +  2  grid  points  and  Ne  =  6 n2H  elements  (where  n#  and  N 
are  the  number  of  elements  in  each  of  the  x ,  y  directions  on  each  of  the  six  faces 
of  the  hexahedron  and  the  polynomial  order,  respectively)  the  maximum  number 
of  processors  that  a  SE  model  can  use  is 

iYproc  =  iVe  =  ^2  (20) 

where  H  =  nnN  represents  the  hexahedral  horizontal  resolution.  In  other  words 
a  SE  model  can  use  as  many  processors  as  there  are  elements.  Thus  for  fixed 
N  the  number  of  processors  allowed  by  a  SE  model  increases  quadratically  with 
resolution,  II.  while  only  linearly  for  a  ST  model.  Using  the  approximation  based 
on  equivalent  number  of  grid  points  given  by 
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a  spectral  resolution  of  T239  translates  to  a  hexahedral  resolution  of  H208  which 
for  simplicity  we  round-up  to  H216  (e.g.,  nn  —  27  N  —  8,  nn  =  36  N  —  6,  or 
nn  =  54  N  —  4).  From  Eq.  (21)  we  see  that  the  resolution  H216  is  equivalent  to 
T249.  However,  due  to  the  h-p  nature  of  the  SE  method  it  can  achieve  a  specified 
grid  resolution  by  either  increasing  the  number  of  elements  (h)  or  the  polynomial 
order  (p).  Thus  it  is  important  to  state  the  polynomial  order  used  to  obtain  a 
specified  grid  resolution  and  so  for  the  resolution  nn  —  36  N  —  6  we  denote  it  as 
T249  N6  in  order  to  distinguish  it  from  other  possible  resolutions  such  as  nn  —  27 
N  =  8  (T249  N8)  or  nH  =  54  N  =  4  (T249  N4). 

At  T239  (which  is  the  current  operational  resolution  used  by  NOGAPS)  a  ST 
model  can  use  360  processors.  At  T249  N6  a  SE  model  can  use  7700  processors; 
a  twenty-fold  increase  in  the  number  of  processors.  Equation  (20)  shows  that  if 
we  wish  to  further  increase  the  number  of  processors  of  the  SE  model  we  simply 
increase  nn  while  decreasing  N  accordingly  in  order  to  maintain  the  horizontal 
resolution  fixed.  Therefore  at  T249  N4  NSEAM  can  accommodate  well  over  17,000 
processors;  a  forty-fold  increase  in  the  number  of  processors.  However,  decreasing 
N  will  impact  the  solution  accuracy  and  the  issue  of  efficiency  versus  accuracy 
must  be  carefully  weighed.  The  point  here  is  that  the  SE  method  offers  this 
flexibility  to  increase  either  the  accuracy  or  efficiency  -  a  luxury  shared  by  neither 
the  spectral  transform  nor  finite  difference  methods. 

( b )  Comparison  of  BDF2  and  LF2  Semi- Implicit  Time-Integrators 

Efficiency  is  arguably  one  of  the  most  important  criterion  for  determining 
whether  a  specific  algorithm  will  be  included  in  an  operational  NWP  model.  In 
Fig.  5  we  show  the  performance  of  NSEAM  using  the  BDF2B  and  LF2  semi- 
implicit  time-integrators  using  double  precision  on  an  IBM  SP4  P690  with  a 
clockspeed  of  1.3  GHz;  this  P690  was  reported  as  the  56th  fastest  computer  in 
the  world  at  SuperComputing  2004.  The  results  for  BDF2A  are  quite  similar  to 
those  for  BDF2B  and  so  we  shall  refer  to  the  BDF  methods  simply  as  BDF2.  The 
resolution  is  T249  N6  L30  with  At  —  300  seconds  for  BDF2  (solid  line)  and  LF2 
(dotted  line)  and  At  =  200  seconds  for  BDF2  (dashed  line).  The  results  in  this 


Figure  5.  Efficiency  of  BDF2  and  LF2  semi-implicit  methods  for  NSEAM  T249  N6  L30  on  an  IBM  SP4 

(P690  with  1.3  GHz  clockspeed). 
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figure  show  that  the  BDF2  semi-implicit  formulation  is  more  efficient  than  LF2 
even  when  BDF2  uses  a  time  step  50%  smaller.  The  reason  for  this  is  quite  simple: 
the  2D  Helmholtz  problem,  given  in  Eq.  (18),  which  must  be  solved  at  each  time 
step  only  differs  for  the  two  methods  by  the  parameter  A.  For  LF2  it  is  equal  to 
2$A t  and  for  BDF2  it  is  |Ai.  Thus  we  can  see  that  the  matrix  corresponding 
to  BDF2  is  more  diagonally  dominant  than  that  for  LF2  (since  d  >  ^),  which 
gives  BDF2  a  smaller  condition  number  than  LF2.  A  smaller  condition  number 
means  that  the  iterative  solver  will  require  fewer  iterations  for  a  given  convergence 
tolerance;  in  Fig.  5,  on  average,  BDF2  required  10  GMRES  iterations  per  time 
step  while  LF2  required  21.  This  difference  in  GMRES  iterations  allows  BDF2  to 
use  a  smaller  time  step  and  still  perform  as  efficiently  as  LF2.  Another  interesting 
result  is  the  performance  of  BDF2.  For  this  method  using  a  time-step  50%  smaller 
(dashed  line)  the  decrease  in  performance  was  not  significant  especially  at  high 
processor  counts.  BDF2  with  At  =  200  seconds  required  fewer  GMRES  iterations 
than  with  At  =  300  seconds.  Clearly,  this  has  a  dramatic  effect  on  performance 
especially  at  high  processor  counts. 

(c)  Comparison  of  NSEAM  and  NOGAPS  Dynamical  Cores 

Figure  6  shows  a  comparison  between  the  dynamical  cores  (i.e.,  no  diabatic 
forcing)  of  NSEAM  and  NOGAPS  on  the  IBM  SP4  P690  using  single  precision. 
For  NOGAPS  we  use  the  current  operational  resolution  T239  L30  while  for 
NSEAM  we  use  T249  N6  L30  with  the  BDF2B  time-integrator.  At  T239  L30 
the  maximum  time  step  that  NOGAPS  can  use  is  300  seconds.  At  this  time  step 
using  96  processors  NSEAM  is  60%  faster  than  NOGAPS;  this  gap  between  the 
two  models  will  continue  to  widen  with  increased  resolution  and/or  processor 
count  in  favor  of  NSEAM  as  predicted  by  Eqs.  (53)  and  (54)  in  Giraldo  and 
Rosmond  (2004). 


Figure  6.  Simulation  days  per  wallclock  hours  for  the  dynamical  cores  of  NSEAM  (T249  N6  L30  using 
BDF2B)  and  NOGAPS  (T239  L30)  on  an  IBM  SP4  (P690  with  1.3  GHz  clockspeed)  using  At  =  300 
seconds.  (The  performance  data  for  NOGAPS  is  courtesy  of  Tim  Hogan.) 


It  is  expected  that  the  resolution  of  NOGAPS  will  increase  to  T479  L60  in  the 
near  future.  Thus  in  Fig.  7  we  show  the  performance  of  NSEAM  at  the  resolutions 
T249  N6  L30  (with  At  =  300  seconds)  and  T498  N6  L60  (with  At  =  150  seconds) 
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on  an  IBM  SP4  P655  with  a  clockspeed  of  1.7GHz  (the  P655  was  reported  as 
the  9th  fastest  computer  in  the  world  at  Supercomputing  2004).  Going  from  a 
resolution  of  T249  N6  L30  to  T498  N6  L60  increases  the  problem  size  by  a  factor  of 
16;  decreasing  the  spacing  in  all  three  directions  by  one  half  increase  the  number 
of  grid  points  by  a  factor  of  8  and  decreasing  the  time  step  by  half  increases  the 
number  of  time  steps  for  one  simulation  day  by  another  factor  of  2.  Comparing 
the  performance  of  NSEAM  T249  N6  L30  (Fig.  7a)  with  T498  N6  L60  (Fig.  7b) 
shows  that  the  difference  in  performance  is  in  fact  less  than  the  expected  value 
of  16.  This  implies  that  NSEAM  becomes  more  efficient  for  large  problem  sizes 
and  high  processor  counts.  This  can  be  seen  in  Fig.  7b  which  shows  the  linear 
scalability  of  NSEAM  T498  N6  L60.  The  largest  cost  incurred  by  NSEAM  is  in 
the  solution  of  the  2D  Helmholtz  problem.  However,  for  30,  60,  or  90  vertical  levels 
NSEAM  only  solves  a  Helmholtz  problem  for  the  first  four  vertical  modes  thereby 
getting  the  remaining  vertical  modes  virtually  at  no  cost  (the  cost  incurred  by 
the  remaining  vertical  modes  comes  in  via  the  explicit  solution).  This  behavior 
is  observed  in  Fig.  7  where  one  can  see  that  at  384  processors  the  difference  in 
performance  between  T249  N6  L30  and  T498  N6  L60  is  approaching  a  factor  of 
8;  thus  the  vertical  solution  is  achieved  at  almost  no  cost. 

The  scalability  studies  have  shown  that  the  NSEAM  model  performs  ex¬ 
tremely  well,  however,  there  are  regions  of  the  model  which  can  still  be  improved 
to  further  increase  its  performance.  Because  NSEAM  is  currently  only  a  research 
tool  it  has  been  designed  with  flexibility  in  mind.  For  example,  the  grids  and  the 
corresponding  domain  decomposition  (DD)  are  generated  automatically  within 
the  code.  This  consumes  precious  CPU  time  which  could  be  avoided  if  the  grid 
and  its  DD  were  generated  off-line  and  then  read  in  as  input.  This  feature  is 
currently  being  implemented  into  the  next  version  of  NSEAM  which  will  have 
terrain,  diabatic  forcing,  and  non-reflecting  boundary  conditions.  Another  likely 
candidate  for  improvement  is  the  solution  of  the  external  mode  which  requires 
far  more  GMRES  iterations  than  the  remaining  modes.  It  is  conjectured  that  the 
external  mode  can  be  solved  much  more  efficiently  by  using  a  coarse  grid  in  a 
multi-grid  approach.  However,  this  topic  is  reserved  for  future  research. 


a)  b) 

Figure  7.  Simulation  days  per  wallclock  hours  for  the  dynamical  core  of  NSEAM  (using  BDF2B)  on 
an  IBM  SP4  (P655  with  1.7  GHz  clockspeed)  using  a)  T249  N6  L30  with  At  =  300  seconds  and  b)  T498 

N6  L60  with  At  =  150  seconds. 
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4.  Results 

The  test  cases  used  to  validate  NSEAM  consist  of  five  baroclinic  tests.  The 
difficulty  with  quantifying  the  error  and/or  accuracy  of  baroclinic  models  is  that 
analytic  solutions  are  difficult  to  obtain.  Instead  we  view  the  following  test  cases 
as  a  means  for  qualitative  comparisons  to  show  that  NSEAM  gives  similar  results 
to  existing  models.  In  order  to  judge  and  compare  the  accuracy  of  NSEAM  we 
plot  normalized  L-2  error  norms  defined  as  follows 


L2 


'/A  (gexact  ~q)  d A 

/a  ^exact/^ 


(22) 


where  q  is  the  computed  solution  vector,  qex act  is  the  exact  solution,  and  A 
represents  the  surface  area  of  the  Earth. 


(a)  Rossby-Haurwitz  Wave  Number  4 

In  this  test  case  (see  Giraldo  and  Rosmond  2004)  we  track  the  propagation  of 
Rossby-Haurwitz  waves  during  a  five  day  period.  Surface  pressure  contours  after 
a  5  day  integration  for  NSEAM  T185  N8  L24  with  At  =  300  seconds  for  BDF2B 
and  LF2  semi-implicit  formulations  are  shown  in  Fig.  8.  The  results  between  the 
two  methods  are  virtually  indistinguishable;  however,  BDF2B  required  33%  fewer 
iterations  per  time  step  to  converge. 


Figure  8.  Rossby-Haurwitz  Wave  Number  4:  The  surface  pressure  contours  in  hPa  for  NSEAM  T185 
N8  L24  using  the  a)  BDF2B  and  b)  LF2  semi-implicit  time-integrators. 


( b )  Polvani  et  al.  Baroclinic  Instability  with  Diffusion 

For  this  test  case  (see  Polvani  et  al.  2004),  the  atmosphere  is  initially 
balanced  and  a  perturbation  is  added  to  the  flow  which  begins  the  motion  of 
the  atmosphere.  This  perturbation  drives  the  atmosphere  towards  a  singularity; 
however,  in  order  to  avoid  this  unpleasantness  a  diffusion  operator  is  added  to 
the  momentum  and  thermodynamic  equations  with  viscosity  7.0  x  10-5  to2  j sec. 
Figure  9  shows  the  surface  temperature  as  a  function  of  longitude  and  latitude 
at  day  12  of  the  integration  for  the  BDF2B  and  LF2  semi-implicit  methods. 
The  results  are  essentially  identical  between  the  two  methods.  In  addition, 
these  results  compare  extremely  well  with  the  Geophysical  Fluid  Dynamics 
Laboratory’s  (GFDL)  spectral  transform  model  (see  Polvani  et  al.  2004).  In  fact, 
at  this  resolution  NSEAM  has  converged  to  the  correct  solution  regardless  of 
which  semi-implicit  method  is  used.  Further  increases  in  either  horizontal  or 
vertical  resolution  show  no  discernible  difference  in  the  solution. 


SEMI-IMPLICIT  TIME-INTEGRATORS  FOR  A  SE  ATMOSPHERIC  MODEL 


17 


Figure  9.  Polvani  et  al.  Test  Case:  The  surface  temperature  of  NSEAM  T74  N8  L20  after  12  days  using 
the  a)  BDF2B  and  b)  LF2  semi-implicit  time-integrators. 


(c)  Jablonowski-Williamson  Balanced  Initial  State 

For  this  test  case  (see  Jablonowski  and  Williams  2002),  the  atmosphere 
is  initially  balanced  and  should  remain  so  indefinitely.  Figure  10  shows  the 
normalized  7r  L-2  error  norm  as  a  function  of  time  for  a  30  day  period  for  NSEAM 
with  the  semi-implicit  BDF2A  (solid  line),  semi-implicit  BDF2B  (dashed-dotted 
line),  semi-implicit  LF2  (dotted  line),  and  explicit  LF2  (dashed  line)  methods 
for  the  resolution  T185  N8  L26;  the  explicit  method  used  in  this  figure  is  in  fact 
the  model  described  in  Giraldo  and  Rosmond  (2004).  Note  that  all  the  methods 
use  different  time  steps  with  the  explicit  using  the  smallest  (42  seconds)  and  the 
semi-implicit  LF2  the  largest  (270  seconds).  Even  though  the  semi-implicit  BDF2 
and  LF2  methods  use  time  steps  much  larger  than  the  explicit  method  the  error 
norms  are  quite  similar  which  confirms  that  the  semi-implicit  solutions  are  as 
accurate  as  the  explicit  one.  It  is  surprising,  however,  that  both  semi-implicit 
BDF2  methods  yield  virtually  indistinguishable  results  from  the  explicit  method 
but  that  the  semi-implicit  LF2  method  does  not.  For  this  case  LF2  uses  a  time 
step  50%  larger  than  BDF2B  but  is  only  210  seconds  faster  for  the  entire  30  day 
simulation.  The  advantages  of  the  larger  time  step  for  LF2  are  somewhat  offset 
by  requiring  50%  more  iterations  per  time  step  than  BDF2B. 

(d)  Jablonowski-Williamson  Baroclinic  Instability 

This  case  is  similar  to  the  balanced  initial  state  except  that  now  a  per¬ 
turbation  is  added  to  the  initial  zonal  velocity.  This  perturbation  grows  until 
a  baroclinic  instability  develops  near  day  nine.  Figure  11  shows  the  minimum 
surface  pressure,  ps,  as  a  function  of  time  for  NSEAM  (BDF2B)  against  various 
models  including  the  NCAR  ST  model  (Hack  et  al.  1992),  the  NASA  Goddard 
FV  model  (Lin  and  Rood  1996),  and  the  German  Weather  Service  icosahedral 
FD  model  (Majewksi  et  al.  2002)  which  we  denote  as  GME.  For  the  grid  point 
models,  we  use  the  definition  of  equivalent  triangular  truncation 

1  ~  - . 

3 

The  results  of  this  case  are  summarized  as  follows.  Figure  11  shows  that  all  four 
models  are  in  complete  agreement  until  day  8,  at  which  point  the  two  low-order 
models  (NASA  and  GME)  diverge  from  the  NCAR  and  NSEAM  models.  The 
two  low-order  models,  NASA  and  GME,  show  a  similar  pattern  during  the  14 
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Figure  10.  Jablonowski- Williamson  Balanced  Initial  State:  The  normalized  7r  L 2  error  as  a  function  of 
days  for  NSEAM  T185  N8  L26  using  the  semi-implicit  BDF2A  method  with  At  =  135  seconds  (solid  line), 
semi- implicit  BDF2B  method  with  At  =  180  seconds  (dashed-dotted  line),  semi- implicit  LF2  method 
with  At  =  270  seconds  (dotted  line),  and  the  explicit  LF2  method  with  At  =  42  seconds  (dashed  line). 


day  integration  but  do  not  match  exactly.  On  the  other  hand  the  two  high-order 
models,  NCAR  and  NSEAM,  behave  almost  identically  throughout  the  entire  14 
day  simulation.  Having  established  that  NSEAM  using  BDF2B  behaves  similarly 


Figure  11.  Jablonowski- Williamson  Baroclinic  Instability:  The  minimum  surface  pressure  (hPa)  as  a 
function  of  days  for  the  NASA  (finite  volume),  GME  (finite-difference),  NCAR  (spectral  transform),  and 
NSEAM  (spectral  element  with  N=8  using  BDF2B)  models  using  26  vertical  levels.  (The  data  for  the 
last  three  models  are  courtesy  of  Christiane  Jablonowski.) 

to  other  well-established  climate  and  NWP  models  we  compare  various  semi- 
implicit  time-integrators  of  NSEAM. 

In  Fig.  12  we  plot  the  minimum  surface  pressure  as  a  function  of  days  for 
NSEAM  T185  N8  L26  using  the  semi-implicit  BDF2A  (solid  line),  semi-implicit 
BDF2B  (dotted  line),  semi-implicit  LF2  (dashed-dotted  line),  and  the  explicit 
LF2  (dashed  line).  Even  though  all  the  models  use  different  time  steps  they 
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all  agree  rather  well  confirming  that  the  semi-implicit  time-integrators  are  as 
accurate  as  the  explicit  method. 


Figure  12.  Jablonowski-Williamson  Baroclinic  Instability:  The  minimum  surface  pressure  (hPa)  as  a 
function  of  days  for  NSEAM  T185  N8  L26  using  the  semi-implicit  BDF2A  with  At  =  135  seconds  (solid 
line),  semi- implicit  BDF2B  method  with  At  =  180  seconds  (dotted  line),  semi- implicit  LF2  method  with 
At  =  270  seconds  (dashed-dotted  line),  and  the  explicit  LF2  method  with  At  =  42  seconds  (dashed  line). 


(e)  Held- Suarez  Mean  Planetary  Climate 
For  this  test  case  (see  Held  and  Suarez  1994),  the  atmosphere  is  initially 
at  rest  and  a  perturbation  is  added  to  the  flow  which  begins  the  motion  of 
the  atmosphere.  A  forcing  function  mimicking  the  radiation  of  the  sun  near  the 
Equator  drives  the  model  towards  a  realistic  mean  planetary  climate.  NSEAM 
was  run  for  1200  days  with  samples  taken  every  4  days  beginning  from  day 
200.  The  sample  files  from  day  200  to  day  1200  are  then  averaged  to  obtain 
a  temporal  mean.  Figure  13  shows  the  time  and  zonally  averaged  zonal  velocity 
for  the  semi-implicit  BDF2B  (left  panel)  and  LF2  (right  panel)  as  a  function  of 
latitude  and  vertical  coordinate.  Both  models  yield  identical  mid-latitude  jets  in 


Figure  13.  Held-Suarez  Mean  Planetary  Climate:  The  time  and  zonally  averaged  zonal  velocity  of  the 
semi-implicit  a)  BDF2B  and  b)  LF2  methods  as  function  of  latitude  and  vertical  coordinate  for  the 

resolution  T74  N8  L20. 
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the  upper  atmosphere  which  agree  with  the  results  obtained  with  the  explicit 
model  in  Giraldo  and  Rosmond  (2004)  and  the  spectral  transform  model  in  Held 
and  Suarez  (1994). 


5.  Conclusion 

The  Naval  Research  Laboratory’s  spectral  element  atmospheric  model  (NSEAM) 
for  scalable  computer  architectures  was  presented.  NSEAM  is  based  on  a  Carte¬ 
sian  formulation  of  the  equations,  the  spectral  element  (SE)  method  in  space,  and 
a  general  second  order  semi-implicit  method  in  time.  Specifically,  semi-implicit 
methods  based  on  backward  difference  formulas  (BDF2)  and  leapfrog  (LF2)  were 
compared.  The  stability  analysis  showed  that  for  reasonable  values  of  the  Asselin 
time-filter,  e,  LF2  is  less  dissipative  than  BDF2  as  long  as  the  implicit  weight  ,  i9, 
is  equal  to  0.5;  if  d  >  0.5  then  LF2  becomes  more  dissipative  than  BDF2.  For  pure 
dynamics  BDF2  was  shown  to  be  as  accurate  as  LF2  while  being  more  efficient 
even  though  it  required  a  smaller  time-step.  This  difference  in  efficiency  is  due 
to  the  resulting  Helmholtz  matrix  having  a  smaller  condition  number  for  BDF2 
than  LF2  which  then  translates  into  fewer  iterations  per  time  step  to  converge. 

We  showed  that  regardless  of  time-integrator  NSEAM  gives  similar  results 
to  well-established  climate  and  weather  prediction  models  while  scaling  quite 
efficiently  on  distributed-memory  computers.  At  a  resolution  of  T249  L30  using 
96  processors,  NSEAM  was  shown  to  be  60%  faster  than  a  spectral  transform 
model  and  this  gap  will  continue  to  grow  in  favor  of  NSEAM  as  the  horizontal 
resolution  and  the  number  of  processors  are  increased.  At  T498  L60  NSEAM  was 
shown  to  scale  linearly  up  to  384  processors  -  a  feat  impossible  to  achieve  by  a 
spectral  transform  model.  Because  SE  models  are  constructed  completely  around 
basis  functions  this  offers  attractive  flexibilities  not  shared  by  other  methods. 
The  shape,  order,  and  characteristics  of  the  grid,  polynomials,  and  model  can  be 
altered  merely  by  changing  the  basis  functions.  This  flexibility  allows  an  SE  model 
to  adapt  to  the  changing  needs  in  science  and  computing  throughout  its  lifetime. 
Finally,  the  advantage  of  using  Cartesian  coordinates  with  the  SE  method  is  that 
the  model  becomes  completely  independent  from  the  grid.  This  means  that  any 
type  of  grid  can  be  used  with  NSEAM.  While  we  have  only  shown  results  on 
hexahedral  grids,  the  extension  to  icosahedral,  telescoping,  and  adaptive  grids 
is  immediately  obvious.  Various  improvements  in  the  accuracy,  efficiency,  and 
flexibility  of  NSEAM  are  currently  underway  and  the  results  of  this  research  will 
be  reported  in  the  future. 
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Semi- Implicit  Formulation 

(a)  Linearization  of  the  Nonlinear  Gravity  Wave  Terms  SG 
The  terms  in  Eq.  (4)  are  linearized  in  the  following  manner.  First,  7r  and  9 
are  linearized  about  the  reference  states  7r*  =  1000  hPa  and 


P  (tt*) 

with  T*  =  300°  Kelvin  (see  Temperton  et  al.  2001).  With  these  reference  states 
defined  we  can  now  proceed  with  the  linearization  of  the  terms  in  Eq.  (4). 

(i)  Surface  Pressure  and  Thermodynamic  Equations 

Based  on  the  previously  defined  reference  states,  the  surface  pressure  and 
thermodynamic  equations  are  linearized  in  the  following  simple  form 

SG( tt)  —  -7T*  (v  ■  u  +  and  SG(9)  —  —cr 

respectively. 

(ii)  Momentum  Equation 

In  order  to  linearize  the  pressure  gradient  the  geopotential  height  itself  must 
be  linearized.  Beginning  with  the  finite  differenced  equation  for  the  geopotential 
height 

$1  ~  4>l+i  —  cp@l  {Pl+1/2  ~  Pi)  +  Cp0i+i  {Pi+ 1  —  -P7+1/2)  ■ 

where  l  —  1,  ...,  N\ev  with  iViev  representing  the  number  of  vertical  levels,  we  can 
then  take  a  Taylor  series  expansion  about  the  reference  states  ir*  and  9*  yielding 


-  <t>l+ 1 

+ 


(-^+1/2  Pi  )  +  cp@l+i  (Pl+ 1  -^+1/2) 


'OP, 


Cp9t 


1+ 1/2 


dir 


d_P[_ 

dir 


{ir  7r* )  T  cp9*l+1 


Equation  (Al)  can  be  written  in  matrix  form  as 

Ai,k4> k  —  Pl,k@k  +  Ci(ir  -  ir*) 


where 


f  1  if  k  —  l 

1  ljk=i  -1  if  k  =  l  +  1 

l  0  if  k  <1  or  k  >  l  +  1 


Pl.k  — 


>*  \ 
1  ) 


P*  _  P 

z+ 1/2  Jri 


T3*  _  T3* 

W+l  {+1/2 


if  k  =  l 

)  if  k  =  l  +  1 

if  k  <  l  or  k  >  l  +  1 


Q 


Cp9\ 

cp9\ 


'9P« 


+  1/2 


dn 

>dP:+i/2  _  dp;\ 
(fn  dir  J 


dP*\  c  Q*  (  dP*+ 1  _  9K+i/2  \  •£  7  <  JVi 

d-K  )  +  CP°l+ 1  ^  d-K  dn  )  11  1  <  iVlev 

if  l  =  iVlev 


(A.l) 


ap,*+i  app2\,  ., 

~ai - aP 
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and  the  range  of  the  indices  are  k,  l,  m  —  1, N\ev .  This  results  in  the  following 
geopotential  gradient 


—  V  ^ Ap^Bkm9m  +  App,Ck( 


7T  —  7T 


where 


A!"1  -  I  1  1 

"V  -  \  0  i 


if  k  >1 
if  k  <1 


Linearizing  the  gradient  due  to  surface  pressure  yields 

=  {^aSrv'n+i 


To  simplify  the  description  of  the  semi-implicit  method  let  us  create  the 
following  definitions:  let 

dP* 

E l,k  —  ■^i!m^m,k  and  Fj  —  Ck  +  Cpdt  i 

which  allows  the  gradient  of  geopotential  and  surface  pressure  to  be  written  as 

dP 

V&  +  CpOi-^j-V-K  =  V  {Eitkek  +  Ft 7r)  . 

We  are  now  ready  to  construct  the  linear  operator  LG{q)  of  Eq.  (5). 


( b )  Linear  Operator  and  Implicit  Correction 

With  the  linearizations  described  above  and  dropping  the  subscripts,  the 
linear  operator  LG  in  Eq.  (5)  can  be  written  as  follows 


/  (V  •«.  +  §*)  \ 


LG(q)  =  - 


V 


V  ( ED  +  Fn) 


roo* 

da 


(A.2) 


J 


Using  this  linearization  we  can  now  write  the  equations  in  terms  of  the  semi- 
implicit  correction  as  follows 


xv.  B 

=  7T  -  XlT*  (  V  •  Utt  +  (<T«) 


Utt  =  u-  AV  {Edit  +  F^tt)  -  nx 


(A.3) 

(A.4) 


(BO*  \ 

°tt~do)  (A-5) 

where 

A  =  S'yAtp-i  (A.  6) 

and  q  and  qtt  are  defined  in  Eqs.  (11)  and  (12). 
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(c)  Construction  of  the  3D  Pseudo -Helmholtz  Operator 

Replacing  the  continuous  spatial  operators  in  Eqs.  (A. 3),  (A. 4),  and  (A. 5) 
by  their  discrete  spectral  element  counterparts  results  in 

nt  =  n - V  M~1DTuttkAak  (A. 7) 

as-aTti 

Utti  —  Vui  —  XVM  lD  (EijkOttk  +  Fmt)  (A. 8) 

9ttl  =  0,  -  X sljk  {- M-lDTutt)k  (A. 9) 

where 

Si,k  —  @iANl^,k  -  i/2Aijk  -  @i_1/2Ai_ (A. 10) 


©/  —  0/+1/2 


al+ 1/2  —  aT 

as  —  ax 


+  ©/- 1/2 


° 7—1/2  —  °T 

as  —  ax 


7 


0 


1+ 1/2 


n*  _  a* 

U 1+1/2  al 

al+ 1/2  —  al  ' 


J  Aak  if  k  <  l 
\  0  if  k>l 


0 


l- 1/2 


o*  o* 

d 1-1/2 

17 1  ~  &1-1/2  ’ 


,  Aak  —  ak+ij2  —  ak_1/2i 


and  M  is  the  mass  matrix,  D  is  the  differentiation  matrix,  and  V  is  the  projection 
matrix  which  constrains  the  Cartesian  velocity  field  to  remain  tangential  to  a 
sigma  surface  (for  details  on  the  SE  discretization  see  Giraldo  and  Rosmond 
2004) .  To  simplify  the  description  of  the  semi-implicit  formulation  we  only  include 
subscripts  for  the  matrices  involving  the  vertical  discretization  because  these  are 
the  terms  to  which  the  vertical  mode  decomposition  is  applied. 

Multiplying  Eq.  (A. 9)  by  E  and  Eq.  (A. 7)  by  F  and  adding  gives 


*i  =  %-  \E^kSk,m  ( M-1DTutt)m  -  A  Fx-  - ANlwm  ( M-1DTutt)m 

(A.ll) 

where 

=  Eijk9ttk  +  Ftnu,  (A.  12) 

—  Ei,k9k  +  Fpr, 

and  as  and  ax  are  the  sigma  values  at  the  surface  and  top  of  the  model.  Let  us 
next  factor  M-1  DTUtt  from  Eq.  (A.ll)  and  rewrite  the  equation  as 

=  m,m  ( M~1DTutt)m  (A. 13) 

where 

7T* 

Vi,m  ~  Ei  kSk  m  +  Fi  A^  m  (A. 14) 

as-aT 

is  the  matrix  containing  the  vertical  contribution  to  the  semi-implicit  formulation. 
Note  that  Eq.  (A. 8)  can  now  be  written  as 


Utti  =  V  ( ut  -  AM  1D$i)  . 


Substituting  Eq.  (A. 15)  into  Eq.  (A. 13)  yields 

~  A 2Vi,m  ( M-1DTVM~1D$)m  =  8,  -  XVLm  {M~lDTVu)m  . 


(A.15) 
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